Characterizing concurrency mechanisms for NVIDIA GPUs under deep learning workloads

نویسندگان

چکیده

We investigate the performance of concurrency mechanisms available on NVIDIA’s new Ampere GPU microarchitecture under deep learning training and inference workloads. In contrast to previous studies that treat as a black box, we examine scheduling at microarchitectural level. find lack fine-grained preemption mechanisms, robust task prioritization options, contention-aware thread block placement policies limits effectiveness mechanisms. summary, sequential nature workloads their fluctuating resource requirements kernel runtimes make executing such while maintaining consistently high utilization low, predictable turnaround times difficult current NVIDIA hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ECC2K-130 on NVIDIA GPUs

A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and GPUs, to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over F2131 . Optimizations have reduced the cost of the computation to approximately 2 bit operations in 2 iterations. GPUs are not des...

متن کامل

Optimizing Stencil Computations for NVIDIA Kepler GPUs

We present a series of optimization techniques for stencil computations on NVIDIA Kepler GPUs. Stencil computations with regular grids had been ported to the older generations of NVIDIA GPUs with significant performance improvements thanks to the higher memory bandwidth than conventional CPU-only systems. However, because of the architectural changes introduced with the latest generation of the...

متن کامل

Reservoir Simulation on NVIDIA Tesla GPUs

In this paper, we introduce our work on accelerating a black oil simulator using GPU-based parallel iterative linear solvers. We develop iterative linear solvers and several commonly used preconditioners on NVIDIA Tesla GPUs. These solvers and preconditioners are coupled with our in-house reservoir simulator. Numerical experiments show that our GPU-based black oil simulator is sped up around si...

متن کامل

Characterizing Computer Systems’ Workloads

The performance of any system cannot be determined without knowing the workload, that is, the set of requests presented to the system. Workload characterization is the process by which we produce models that are capable of describing and reproducing the behavior of a workload. Such models are imperative to any performance related studies such as capacity planning, workload balancing, performanc...

متن کامل

Evaluating On-Node GPU Interconnects for Deep Learning Workloads

Scaling deep learning workloads across multiple GPUs on a single node has become increasingly important in data analytics. A key question is how well a PCIe-based GPU interconnect can perform relative to a custom high-performance interconnect such as NVIDIA’s NVLink. This paper evaluates two such on-node interconnects for eight NVIDIA Pascal P100 GPUs: (a) the NVIDIA DGX-1’s NVLink 1.0 ‘hybrid ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Performance Evaluation

سال: 2021

ISSN: ['0166-5316', '1872-745X']

DOI: https://doi.org/10.1016/j.peva.2021.102234